A Data-Driven Approach Using the NFL Big Data Bowl Dataset and Advanced Machine Learning Techniques
Rows: 393,536
Technique: Group Splitting (Game ID / Play ID)
Factors to Consider:
- Tackle (0/1)
- Future X/Y
- S/A/O/Dir of defender
- Position / Alignment cluster Interaction
- Number of Defenders in the Box
- Current and future (.5 seconds) location of the ball
- O/S/A/Dir of ball carrier
- Velocity/direction difference
- Ball in defensive players ‘fan’
Concerns:
\[\text{Minimize } \left\{ \frac{1}{N} \sum_{i=1}^{N} (y_i - \mathbf{x}_i^T \boldsymbol{\beta})^2 + \lambda \left[ \frac{1 - \alpha}{2} \|\boldsymbol{\beta}\|_2^2 + \alpha \|\boldsymbol{\beta}\|_1 \right] \right\}\]
The best parameters are:
Lambda = 0.00011
Alpha = 0.6723358
Accuracy of 92.44%
The best parameters are:
Mtry = 7
Min_n = 6
Trees = 278
Accuracy of 92.87%.
The best parameters are:
Trees = 219
Min_n = 9
Tree Depth = 1
Learn Rate = 1.2
Loss Reduction = 24
Sample Size = 1
Accuracy of 92.87%.
def build_model(input_shape):
model = Sequential([
Dense(64, activation='relu', input_shape=[input_shape], kernel_regularizer=l2(0.001)),
BatchNormalization(), # normalizes layer inputs to stabilize and accelerate neural training
Dropout(0.3), # randomly deactivates neurons to prevent overfitting
Dense(64, activation='relu', kernel_regularizer=l2(0.001)),
BatchNormalization(),
Dropout(0.3),
Dense(64, activation='relu', kernel_regularizer=l2(0.001)),
BatchNormalization(),
Dropout(0.3),
Dense(1, activation='sigmoid', kernel_regularizer=l2(0.001)) # Apply L2 regularization here
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
return model
\(\sum_{i=1}^{N} (\mathbb{I}_{\text{tackle}_i} - P(\text{tackle}_i))\)
Where:
| Penalized Regression | ||
| Accuracy: 92.68% | ||
| Display Name | TOE | Position |
|---|---|---|
| Talanoa Hufanga | 6.00 | SS |
| Jonathan Owens | 4.35 | FS |
| Cameron Jordan | 4.24 | DE |
| Kevin Byard | 4.14 | FS |
| Dre Greenlaw | −3.22 | ILB |
| Willie Gay | −3.26 | OLB |
| Cody Barton | −3.98 | MLB |
| Demario Davis | −4.56 | MLB |
| Random Forest | ||
| Accuracy: 92.87% | ||
| Display Name | TOE | Position |
|---|---|---|
| Talanoa Hufanga | 4.92 | SS |
| Maxx Crosby | 3.89 | DE |
| Jonathan Owens | 3.87 | FS |
| Cameron Jordan | 3.79 | DE |
| Xavier McKinney | −2.80 | FS |
| Demario Davis | −2.91 | MLB |
| Damien Wilson | −3.15 | MLB |
| Cody Barton | −3.76 | MLB |
| Extreme Gradient Boosting | ||
| Accuracy: 92.44% | ||
| Display Name | TOE | Position |
|---|---|---|
| Talanoa Hufanga | 6.12 | SS |
| Jonathan Owens | 4.58 | FS |
| Maxx Crosby | 4.16 | DE |
| Cameron Jordan | 4.03 | DE |
| Damien Wilson | −3.66 | MLB |
| Christian Kirksey | −3.97 | OLB |
| Cody Barton | −5.08 | MLB |
| Demario Davis | −5.57 | MLB |
| Neural Net | ||
| Accuracy: 92.92% | ||
| Display Name | TOE | Position |
|---|---|---|
| Jonathan Owens | 6.10 | FS |
| C.J. Mosley | 3.74 | ILB |
| Jihad Ward | 3.69 | OLB |
| Grover Stewart | 3.45 | DT |
| Marcus Davenport | −1.04 | DE |
| Steven Nelson | −1.13 | CB |
| Marshon Lattimore | −1.14 | CB |
| Julian Love | −1.31 | SS |